30 research outputs found
Not Just Learning from Others but Relying on Yourself: A New Perspective on Few-Shot Segmentation in Remote Sensing
Few-shot segmentation (FSS) is proposed to segment unknown class targets with
just a few annotated samples. Most current FSS methods follow the paradigm of
mining the semantics from the support images to guide the query image
segmentation. However, such a pattern of `learning from others' struggles to
handle the extreme intra-class variation, preventing FSS from being directly
generalized to remote sensing scenes. To bridge the gap of intra-class
variance, we develop a Dual-Mining network named DMNet for cross-image mining
and self-mining, meaning that it no longer focuses solely on support images but
pays more attention to the query image itself. Specifically, we propose a
Class-public Region Mining (CPRM) module to effectively suppress irrelevant
feature pollution by capturing the common semantics between the support-query
image pair. The Class-specific Region Mining (CSRM) module is then proposed to
continuously mine the class-specific semantics of the query image itself in a
`filtering' and `purifying' manner. In addition, to prevent the co-existence of
multiple classes in remote sensing scenes from exacerbating the collapse of FSS
generalization, we also propose a new Known-class Meta Suppressor (KMS) module
to suppress the activation of known-class objects in the sample. Extensive
experiments on the iSAID and LoveDA remote sensing datasets have demonstrated
that our method sets the state-of-the-art with a minimum number of model
parameters. Significantly, our model with the backbone of Resnet-50 achieves
the mIoU of 49.58% and 51.34% on iSAID under 1-shot and 5-shot settings,
outperforming the state-of-the-art method by 1.8% and 1.12%, respectively. The
code is publicly available at https://github.com/HanboBizl/DMNet.Comment: accepted to IEEE TGR
Semantic Segmentation for Point Cloud Scenes via Dilated Graph Feature Aggregation and Pyramid Decoders
Semantic segmentation of point clouds generates comprehensive understanding
of scenes through densely predicting the category for each point. Due to the
unicity of receptive field, semantic segmentation of point clouds remains
challenging for the expression of multi-receptive field features, which brings
about the misclassification of instances with similar spatial structures. In
this paper, we propose a graph convolutional network DGFA-Net rooted in dilated
graph feature aggregation (DGFA), guided by multi-basis aggregation loss
(MALoss) calculated through Pyramid Decoders. To configure multi-receptive
field features, DGFA which takes the proposed dilated graph convolution
(DGConv) as its basic building block, is designed to aggregate multi-scale
feature representation by capturing dilated graphs with various receptive
regions. By simultaneously considering penalizing the receptive field
information with point sets of different resolutions as calculation bases, we
introduce Pyramid Decoders driven by MALoss for the diversity of receptive
field bases. Combining these two aspects, DGFA-Net significantly improves the
segmentation performance of instances with similar spatial structures.
Experiments on S3DIS, ShapeNetPart and Toronto-3D show that DGFA-Net
outperforms the baseline approach, achieving a new state-of-the-art
segmentation performance.Comment: accepted to AAAI Workshop 202
Breaking Immutable: Information-Coupled Prototype Elaboration for Few-Shot Object Detection
Few-shot object detection, expecting detectors to detect novel classes with a
few instances, has made conspicuous progress. However, the prototypes extracted
by existing meta-learning based methods still suffer from insufficient
representative information and lack awareness of query images, which cannot be
adaptively tailored to different query images. Firstly, only the support images
are involved for extracting prototypes, resulting in scarce perceptual
information of query images. Secondly, all pixels of all support images are
treated equally when aggregating features into prototype vectors, thus the
salient objects are overwhelmed by the cluttered background. In this paper, we
propose an Information-Coupled Prototype Elaboration (ICPE) method to generate
specific and representative prototypes for each query image. Concretely, a
conditional information coupling module is introduced to couple information
from the query branch to the support branch, strengthening the query-perceptual
information in support features. Besides, we design a prototype dynamic
aggregation module that dynamically adjusts intra-image and inter-image
aggregation weights to highlight the salient information useful for detecting
query images. Experimental results on both Pascal VOC and MS COCO demonstrate
that our method achieves state-of-the-art performance in almost all settings.Comment: Accepted by AAAI202
Elevation Estimation-Driven Building 3D Reconstruction from Single-View Remote Sensing Imagery
Building 3D reconstruction from remote sensing images has a wide range of
applications in smart cities, photogrammetry and other fields. Methods for
automatic 3D urban building modeling typically employ multi-view images as
input to algorithms to recover point clouds and 3D models of buildings.
However, such models rely heavily on multi-view images of buildings, which are
time-intensive and limit the applicability and practicality of the models. To
solve these issues, we focus on designing an efficient DSM estimation-driven
reconstruction framework (Building3D), which aims to reconstruct 3D building
models from the input single-view remote sensing image. First, we propose a
Semantic Flow Field-guided DSM Estimation (SFFDE) network, which utilizes the
proposed concept of elevation semantic flow to achieve the registration of
local and global features. Specifically, in order to make the network semantics
globally aware, we propose an Elevation Semantic Globalization (ESG) module to
realize the semantic globalization of instances. Further, in order to alleviate
the semantic span of global features and original local features, we propose a
Local-to-Global Elevation Semantic Registration (L2G-ESR) module based on
elevation semantic flow. Our Building3D is rooted in the SFFDE network for
building elevation prediction, synchronized with a building extraction network
for building masks, and then sequentially performs point cloud reconstruction,
surface reconstruction (or CityGML model reconstruction). On this basis, our
Building3D can optionally generate CityGML models or surface mesh models of the
buildings. Extensive experiments on ISPRS Vaihingen and DFC2019 datasets on the
DSM estimation task show that our SFFDE significantly improves upon
state-of-the-arts. Furthermore, our Building3D achieves impressive results in
the 3D point cloud and 3D model reconstruction process
RFA-Net: Reconstructed Feature Alignment Network for Domain Adaptation Object Detection in Remote Sensing Imagery
With the development of deep learning, great progress has been made in object detection of remote sensing (RS) imagery. However, the object detector is hard to generalize well from one labeled dataset (source domain) to another unlabeled dataset (target domain) due to the discrepancy of data distribution (domain shift). Currently, adversarial-based domain adaptation methods align the semantic features of source and target domain features to alleviate the domain shift. But they fail to avoid the alignment of noisy background features and neglect the instance-level features, which are inappropriate for detection models that focus on instance location and classification. To mitigate domain shift existing in object detection, we propose a reconstructed feature alignment network (RFA-Net) for unsupervised cross-domain object detection in RS imagery. The RFA-Net includes one sequential data augmentation module deployed on data level for providing solid gains on unlabeled data, one sparse feature reconstruction module deployed on feature level to intensify instance feature for feature alignment, and one pseudo-label generation module deployed on label level for the supervision of the unlabeled target domain. Extensive experiments illustrate that our proposed RFA-Net is effective to alleviate the domain shift problem in domain adaptation object detection of RS imagery
Aircraft Reconstruction in High Resolution SAR Images Using Deep Shape Prior
Object reconstruction is of vital importance in Synthetic Aperture Radar (SAR) image analysis. In this paper, we propose a novel method based on shape prior to reconstruct aircraft in high resolution SAR images. The method mainly contains two stages. In the shape prior modeling stage, a generative deep learning method is used to model deep shape priors; a novel framework is then proposed in the reconstruction stage, which integrates the shape priors in the process of reconstruction. Specifically, to address the issue of object rotation, a novel pose estimation method is proposed to obtain candidate poses, which avoids making an exhaustive search for each pose. In addition, an energy function combining a scattering region term and a shape prior term is proposed; this is optimized via an iterative optimization algorithm to achieve the goal of object reconstruction. To the best of our knowledge, this is the first attempt made to reconstruct objects with complex shapes in SAR images using deep shape priors. Experiments are conducted on the dataset acquired by TerraSAR-X and results demonstrate the accuracy and robustness of the proposed method
AF-EMS Detector: Improve the Multi-Scale Detection Performance of the Anchor-Free Detector
As a precursor step for computer vision algorithms, object detection plays an important role in various practical application scenarios. With the objects to be detected becoming more complex, the problem of multi-scale object detection has attracted more and more attention, especially in the field of remote sensing detection. Early convolutional neural network detection algorithms are mostly based on artificially preset anchor-boxes to divide different regions in the image, and then obtain the prior position of the target. However, the anchor box is difficult to set reasonably and will cause a large amount of computational redundancy, which affects the generality of the detection model obtained under fixed parameters. In the past two years, anchor-free detection algorithm has achieved remarkable development in the field of detection on natural image. However, there is no sufficient research on how to deal with multi-scale detection more effectively in anchor-free framework and use these detectors on remote sensing images. In this paper, we propose a specific-attention Feature Pyramid Network (FPN) module, which is able to generate a feature pyramid, basing on the characteristics of objects with various sizes. In addition, this pyramid suits multi-scale object detection better. Besides, a scale-aware detection head is proposed which contains a multi-receptive feature fusion module and a size-based feature compensation module. The new anchor-free detector can obtain a more effective multi-scale feature expression. Experiments on challenging datasets show that our approach performs favorably against other methods in terms of the multi-scale object detection performance